-
Notifications
You must be signed in to change notification settings - Fork 173
[cryptography/bloomfilter] generic Hasher and optimizations #2729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Deploying with
|
| Status | Name | Latest Commit | Updated (UTC) |
|---|---|---|---|
| ✅ Deployment successful! View logs |
commonware-mcp | eb675f6 | Jan 19 2026, 08:18 PM |
|
cryptography/src/bloomfilter/mod.rs
Outdated
| /// | ||
| /// The number of bits will be rounded up to the next power of 2. | ||
| pub fn new(hashers: NonZeroU8, bits: NonZeroUsize) -> Self { | ||
| let bits = bits.get().next_power_of_two(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decided to just round up to the next power of two rather than crashing here. Let me know if you prefer asserting.
Deploying monorepo with
|
| Latest commit: |
eb675f6
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://db94ad89.monorepo-eu0.pages.dev |
| Branch Preview URL: | https://andre-bloomfilter-improvemen.monorepo-eu0.pages.dev |
0ebf814 to
5589649
Compare
|
bugbot run |
patrick-ogrady
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial comments
cryptography/src/bloomfilter/mod.rs
Outdated
| // Extract two 64-bit hash values from the SHA256 digest of the item | ||
| let digest = Sha256::hash(item); | ||
| let h1 = u64::from_be_bytes(digest[0..8].try_into().unwrap()); | ||
| let h2 = u64::from_be_bytes(digest[8..16].try_into().unwrap()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the rationale for u64 instead? I opted for u128 for security.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
u64 arithmetic is faster than u128. My rationale was that bloom filters do not rely on collision resistance or large hash domains. All entropy beyond log2(m) bits is discarded during indexing, so using u128 does not improve security or false-positive behavior. The cryptographic hash should already provide more than enough uniformity.
cryptography/src/bloomfilter/mod.rs
Outdated
| let digest = Sha256::hash(item); | ||
| let h1 = u64::from_be_bytes(digest[0..8].try_into().unwrap()); | ||
| let mut h2 = u64::from_be_bytes(digest[8..16].try_into().unwrap()); | ||
| h2 |= 1; // make sure h2 is non-zero |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have to be the unluckiest person in the universe for this to happen, but the Kirsch-Mitzenmacher optimization assumes that h2 is non-zero.
|
I updated the benchmarks, here's a comparison of the impact of the optimizations: The optimizations have a big impact for small inputs size, for larger inputs computing the hash will dominate. Either way, this is consistently faster on all benchmarks. |
| /// | ||
| /// - **Performance**: Hash function performance varies with the size of items inserted | ||
| /// and queried. [Sha256] is faster for smaller items (up to ~2KB), while | ||
| /// [Blake3](crate::blake3::Blake3) is faster for larger items (4KB+). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When adding Blake3, we found it was faster for both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is based on the benchmarks I added for bloom filter that run with both hash functions.
|
|
||
| impl<H: Hasher> BloomFilter<H> { | ||
| /// Compile-time assertion that the digest is at least 16 bytes. | ||
| const _ASSERT_DIGEST_AT_LEAST_16_BYTES: () = assert!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonder if this is overkill
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is not true then we'll panic:
let h1 = u64::from_be_bytes(digest[0..8].try_into().unwrap());
let mut h2 = u64::from_be_bytes(digest[8..16].try_into().unwrap());
IMO makes sense to verify at compile-time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good
patrick-ogrady
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
Codecov Report❌ Patch coverage is
@@ Coverage Diff @@
## main #2729 +/- ##
==========================================
+ Coverage 92.24% 92.25% +0.01%
==========================================
Files 376 376
Lines 119098 119408 +310
==========================================
+ Hits 109856 110165 +309
- Misses 9242 9243 +1
... and 5 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
This PR makes the
BloomFiltergeneric overHasher, defaulting to Sha256.To support practical bloom filter sizing, a new
with_rateconstructor computes the optimal number of bits and hash functions for a given expected number of items and target false positive rate. Internally, this usesBigRationalto ensure deterministic results across platforms, important for consensus. This is the recommended way to construct bloom filters in most cases, as it automatically balances space efficiency against the desired accuracy.The index calculation has been optimized to use 64-bit arithmetic instead of 128-bit, and the bit count is now always rounded up to the next power of two. This allows using a bitmask instead of modulo for index calculation, which is faster.
Benchmarks were added to measure that the optimizations above work as expected. They compare Sha256 and Blake3 performance across different item sizes (32, 2048, 4096 bytes) and false positive rates (10%, 0.1%).